Overview

Dataset statistics

Number of variables11
Number of observations53197
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.2 MiB
Average record size in memory142.0 B

Variable types

NUM8
BOOL2
CAT1

Reproduction

Analysis started2020-03-01 12:09:52.661693
Analysis finished2020-03-01 12:12:04.192799
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
LowUserPrice is highly skewed (γ1 = 72.56560029) Skewed
LowNetPrice is highly skewed (γ1 = 78.1880364) Skewed
ReleaseNumber has 4550 (8.6%) zeros Zeros
PriceReg has 976 (1.8%) zeros Zeros
LowUserPrice has 6431 (12.1%) zeros Zeros
LowNetPrice has 1579 (3.0%) zeros Zeros

Variables

SKU_number
Real number (ℝ≥0)

UNIQUE
Distinct count53197
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean652619.6144
Minimum50001
Maximum3959831
Zeros0
Zeros (%)0.0%
Memory size415.7 KiB

Quantile statistics

Minimum50001
5-th percentile54671.6
Q1170005
median540957
Q3759710
95-th percentile2407744.2
Maximum3959831
Range3909830
Interquartile range (IQR)589705

Descriptive statistics

Standard deviation687327.1684
Coefficient of variation (CV)1.053181904
Kurtosis4.169920642
Mean652619.6144
Median Absolute Deviation (MAD)453234.278
Skewness2.067154779
Sum3.471740563e+10
Variance4.724186364e+11
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 50001. 55752.5 58568.5 60002. 61260.5 ... 3635024.5 3744666. 3834147. 3959830. 3959831. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
530429 1 < 0.1%
 
545951 1 < 0.1%
 
755396 1 < 0.1%
 
206001 1 < 0.1%
 
119983 1 < 0.1%
 
902317 1 < 0.1%
 
642220 1 < 0.1%
 
652459 1 < 0.1%
 
105638 1 < 0.1%
 
218677 1 < 0.1%
 
Other values (53187) 53187 > 99.9%
 
ValueCountFrequency (%) 
50001 1 < 0.1%
 
50002 1 < 0.1%
 
50003 1 < 0.1%
 
50004 1 < 0.1%
 
50006 1 < 0.1%
 
ValueCountFrequency (%) 
3959831 1 < 0.1%
 
3959829 1 < 0.1%
 
3951300 1 < 0.1%
 
3951290 1 < 0.1%
 
3951231 1 < 0.1%
 

MarketingType
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size415.7 KiB
S
28633
D
24564
ValueCountFrequency (%) 
S 28633 53.8%
 
D 24564 46.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 2 100.0%
 
ValueCountFrequency (%) 
Latin 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

ReleaseNumber
Real number (ℝ≥0)

ZEROS
Distinct count58
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.125796567
Minimum0
Maximum99
Zeros4550
Zeros (%)8.6%
Memory size415.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q36
95-th percentile11
Maximum99
Range99
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.895184175
Coefficient of variation (CV)0.9441047593
Kurtosis31.66739068
Mean4.125796567
Median Absolute Deviation (MAD)2.731417112
Skewness3.331137132
Sum219480
Variance15.17245976
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 25.5 35.5 44.5 52.5 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 11278 21.2%
 
3 7333 13.8%
 
1 6715 12.6%
 
4 5241 9.9%
 
0 4550 8.6%
 
5 4176 7.9%
 
6 3336 6.3%
 
7 2630 4.9%
 
8 2057 3.9%
 
9 1590 3.0%
 
Other values (48) 4291 8.1%
 
ValueCountFrequency (%) 
0 4550 8.6%
 
1 6715 12.6%
 
2 11278 21.2%
 
3 7333 13.8%
 
4 5241 9.9%
 
ValueCountFrequency (%) 
99 1 < 0.1%
 
97 1 < 0.1%
 
82 1 < 0.1%
 
66 1 < 0.1%
 
58 1 < 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size415.7 KiB
1
41932
0
11265
ValueCountFrequency (%) 
1 41932 78.8%
 
0 11265 21.2%
 

StrengthFactor
Real number (ℝ≥0)

Distinct count52505
Unique (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1216756.574
Minimum68
Maximum16669658
Zeros0
Zeros (%)0.0%
Memory size415.7 KiB

Quantile statistics

Minimum68
5-th percentile35014.6
Q1243225
median715500
Q31552432
95-th percentile4136155.8
Maximum16669658
Range16669590
Interquartile range (IQR)1309207

Descriptive statistics

Standard deviation1524906.895
Coefficient of variation (CV)1.253255522
Kurtosis12.55169626
Mean1216756.574
Median Absolute Deviation (MAD)1027351.915
Skewness2.892325144
Sum6.472779946e+10
Variance2.32534104e+12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[6.80000000e+01 1.73450000e+03 5.99805000e+04 1.14373000e+05 1.47962500e+05 ... 7.42892900e+06 9.80970050e+06 1.12440185e+07 1.27598820e+07 1.66696580e+07], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
3158 3 < 0.1%
 
586672 3 < 0.1%
 
12815 3 < 0.1%
 
129992 3 < 0.1%
 
29717 3 < 0.1%
 
209753 3 < 0.1%
 
58655 3 < 0.1%
 
27028 3 < 0.1%
 
1080008 2 < 0.1%
 
1059564 2 < 0.1%
 
Other values (52495) 53169 99.9%
 
ValueCountFrequency (%) 
68 1 < 0.1%
 
113 1 < 0.1%
 
193 1 < 0.1%
 
200 1 < 0.1%
 
227 1 < 0.1%
 
ValueCountFrequency (%) 
16669658 1 < 0.1%
 
16428309 1 < 0.1%
 
16411446 1 < 0.1%
 
15661005 1 < 0.1%
 
15599056 1 < 0.1%
 

PriceReg
Real number (ℝ≥0)

ZEROS
Distinct count7464
Unique (%)14.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean99.00402598
Minimum0
Maximum3986.31
Zeros976
Zeros (%)1.8%
Memory size415.7 KiB

Quantile statistics

Minimum0
5-th percentile18.928
Q149.95
median78.95
Q3127.95
95-th percentile242.95
Maximum3986.31
Range3986.31
Interquartile range (IQR)78

Descriptive statistics

Standard deviation80.63133269
Coefficient of variation (CV)0.8144247862
Kurtosis215.1874619
Mean99.00402598
Median Absolute Deviation (MAD)54.46563708
Skewness7.318281322
Sum5266717.17
Variance6501.411811
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-03 3.975000e+00 4.020000e+00 4.965000e+00 ... 5.987900e+02 5.992450e+02 6.135350e+02 1.086105e+03 3.986310e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 976 1.8%
 
49.95 727 1.4%
 
39.95 566 1.1%
 
49.99 503 0.9%
 
59.95 497 0.9%
 
69.95 401 0.8%
 
44.95 392 0.7%
 
39.99 375 0.7%
 
79.95 360 0.7%
 
75 333 0.6%
 
Other values (7454) 48067 90.4%
 
ValueCountFrequency (%) 
0 976 1.8%
 
0.01 1 < 0.1%
 
0.34 1 < 0.1%
 
0.5 1 < 0.1%
 
1 2 < 0.1%
 
ValueCountFrequency (%) 
3986.31 1 < 0.1%
 
2800 1 < 0.1%
 
2707.22 1 < 0.1%
 
2645.3 1 < 0.1%
 
2432.64 1 < 0.1%
 

ReleaseYear
Real number (ℝ≥0)

Distinct count66
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2005.971389
Minimum1945
Maximum2016
Zeros0
Zeros (%)0.0%
Memory size415.7 KiB

Quantile statistics

Minimum1945
5-th percentile1995
Q12003
median2007
Q32010
95-th percentile2013
Maximum2016
Range71
Interquartile range (IQR)7

Descriptive statistics

Standard deviation6.098044798
Coefficient of variation (CV)0.003039946048
Kurtosis6.202119143
Mean2005.971389
Median Absolute Deviation (MAD)4.461849698
Skewness-1.772564234
Sum106711660
Variance37.18615036
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1945. 1961.5 1968.5 1977.5 1981.5 ... 2012.5 2013.5 2014.5 2015.5 2016. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2010 4367 8.2%
 
2008 4200 7.9%
 
2009 4126 7.8%
 
2007 4064 7.6%
 
2006 3899 7.3%
 
2011 3886 7.3%
 
2005 3563 6.7%
 
2012 3436 6.5%
 
2004 3008 5.7%
 
2013 2585 4.9%
 
Other values (56) 16063 30.2%
 
ValueCountFrequency (%) 
1945 1 < 0.1%
 
1950 2 < 0.1%
 
1952 2 < 0.1%
 
1953 1 < 0.1%
 
1954 1 < 0.1%
 
ValueCountFrequency (%) 
2016 48 0.1%
 
2015 397 0.7%
 
2014 1655 3.1%
 
2013 2585 4.9%
 
2012 3436 6.5%
 

ItemCount
Real number (ℝ≥0)

Distinct count379
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.94165084
Minimum0
Maximum1426
Zeros53
Zeros (%)0.1%
Memory size415.7 KiB

Quantile statistics

Minimum0
5-th percentile13
Q122
median34
Q353
95-th percentile107
Maximum1426
Range1426
Interquartile range (IQR)31

Descriptive statistics

Standard deviation37.51590055
Coefficient of variation (CV)0.8537662976
Kurtosis82.52224851
Mean43.94165084
Median Absolute Deviation (MAD)23.40039562
Skewness5.468709865
Sum2337564
Variance1407.442794
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000e+00 5.000e-01 4.500e+00 6.500e+00 8.500e+00 ... 2.730e+02 3.075e+02 3.960e+02 6.565e+02 1.426e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
20 1344 2.5%
 
18 1344 2.5%
 
21 1341 2.5%
 
22 1326 2.5%
 
19 1318 2.5%
 
24 1304 2.5%
 
25 1249 2.3%
 
23 1230 2.3%
 
26 1187 2.2%
 
17 1175 2.2%
 
Other values (369) 40379 75.9%
 
ValueCountFrequency (%) 
0 53 0.1%
 
1 19 < 0.1%
 
2 18 < 0.1%
 
3 23 < 0.1%
 
4 29 0.1%
 
ValueCountFrequency (%) 
1426 1 < 0.1%
 
930 1 < 0.1%
 
851 1 < 0.1%
 
827 1 < 0.1%
 
816 1 < 0.1%
 

LowUserPrice
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count10425
Unique (%)19.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56.45553715
Minimum0
Maximum14140.21
Zeros6431
Zeros (%)12.1%
Memory size415.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q120.19
median44
Q379.59
95-th percentile146.05
Maximum14140.21
Range14140.21
Interquartile range (IQR)59.4

Descriptive statistics

Standard deviation99.00628973
Coefficient of variation (CV)1.753703794
Kurtosis8773.674313
Mean56.45553715
Median Absolute Deviation (MAD)38.68784116
Skewness72.56560029
Sum3003265.21
Variance9802.245405
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 1.500000e+00 3.355000e+00 3.815000e+00 3.865000e+00 ... 3.497400e+02 6.040500e+02 1.209875e+03 5.039290e+03 1.414021e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 6431 12.1%
 
4 414 0.8%
 
33.99 203 0.4%
 
43.99 185 0.3%
 
53.99 182 0.3%
 
58.99 153 0.3%
 
63.99 152 0.3%
 
28.99 137 0.3%
 
48.99 133 0.3%
 
78.99 131 0.2%
 
Other values (10415) 45076 84.7%
 
ValueCountFrequency (%) 
0 6431 12.1%
 
3 1 < 0.1%
 
3.33 1 < 0.1%
 
3.38 1 < 0.1%
 
3.4 4 < 0.1%
 
ValueCountFrequency (%) 
14140.21 1 < 0.1%
 
7781 1 < 0.1%
 
5272.4 1 < 0.1%
 
4806.18 1 < 0.1%
 
3998.99 1 < 0.1%
 

LowNetPrice
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count11157
Unique (%)21.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.33542343
Minimum0
Maximum19138.79
Zeros1579
Zeros (%)3.0%
Memory size415.7 KiB

Quantile statistics

Minimum0
5-th percentile6.39
Q118.71
median36.08
Q356.98
95-th percentile111.954
Maximum19138.79
Range19138.79
Interquartile range (IQR)38.27

Descriptive statistics

Standard deviation139.0497429
Coefficient of variation (CV)2.937540913
Kurtosis8523.387876
Mean47.33542343
Median Absolute Deviation (MAD)30.23378489
Skewness78.1880364
Sum2518102.52
Variance19334.83101
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 9.750000e-01 3.940000e+00 3.995000e+00 4.005000e+00 ... 6.749200e+02 6.776450e+02 1.059115e+03 2.670110e+03 1.913879e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 1579 3.0%
 
23.99 405 0.8%
 
28.99 347 0.7%
 
33.99 340 0.6%
 
18.99 314 0.6%
 
13.99 313 0.6%
 
38.99 266 0.5%
 
43.99 256 0.5%
 
8.99 235 0.4%
 
53.99 233 0.4%
 
Other values (11147) 48909 91.9%
 
ValueCountFrequency (%) 
0 1579 3.0%
 
1.95 1 < 0.1%
 
2.44 1 < 0.1%
 
2.68 1 < 0.1%
 
3 2 < 0.1%
 
ValueCountFrequency (%) 
19138.79 1 < 0.1%
 
10003.97 1 < 0.1%
 
9833.28 1 < 0.1%
 
9828.14 1 < 0.1%
 
7781.01 2 < 0.1%
 

SoldFlag
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size415.7 KiB
0
44100
1
9097
ValueCountFrequency (%) 
0 44100 82.9%
 
1 9097 17.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

SKU_numberMarketingTypeReleaseNumberNew_Release_FlagStrengthFactorPriceRegReleaseYearItemCountLowUserPriceLowNetPriceSoldFlag
0266198S00125590132.5020042627.5710.990
12405151S0020840470.0019882115.9940.021
2528384D9150339324.95201353149.73123.940
3571691S41136125539.9520056710.597.590
42363274D103760024144.0020041343.9433.990
5919625D71415071257.7520105086.9945.540
6162332S10495212115.002009140.0081.090
7313692D00368765037.3320032945.524.330
8621965S103417349.9520104226.3330.980
9109242S51242521743.0320052041.4945.620

Last rows

SKU_numberMarketingTypeReleaseNumberNew_Release_FlagStrengthFactorPriceRegReleaseYearItemCountLowUserPriceLowNetPriceSoldFlag
53187586770D0061803838.0020023233.9815.981
53188895228S2163767354.9520143149.3742.280
53189154036S3125795119.5020094063.3069.980
53190747156S0075883542.001998350.0021.470
53191639520S1064799067.8019962980.5551.080
531923059530S0060752699.95200322253.9299.760
5319357540D11136237830.6020068793.9818.971
531942471534D21604613040.951996160.0032.480
53195567658D81973353105.9520084552.6923.990
53196183696S314138946242.9520061913.9053.890